22 research outputs found

    A deep learning method for automatic SMS spam classification: Performance of learning algorithms on indigenous dataset

    Get PDF
    SMS, one of the most popular and fast-growing GSM value-added services worldwide, has attracted unwanted SMS, also known as SMS spam. The effects of SMS spam are significant as it affects both the users and the service providers, causing a massive gap in trust among both parties. This article presents a deep learning model based on BiLSTM. Further, it compares our results with some of the states of the art machine learning (ML) algorithm on two datasets: our newly collected dataset and the popular UCI SMS dataset. This study aims to evaluate the performance of diverse learning models and compare the result of the new dataset expanded (ExAIS_SMS) using the following metrics the true positive (TP), false positive (FP), F-measure, recall, precision, and overall accuracy. The average accuracy for the BiLSTSM model achieved moderately improved results compared to some of the ML classifiers. The experimental results achieved significant improvement from the ground truth results after effective fine-tuning of some of the parameters. The BiLSTM model using the ExAIS_SMS dataset attained an accuracy of 93.4% and 98.6% for UCI datasets. Further comparison of the two datasets on the state-of-the-art ML classifiers gave an accuracy of Naive Bayes, BayesNet, SOM, decision tree, C4.5, J48 is 89.64%, 91.11%, 88.24%, 75.76%, 80.24%, and 79.2% respectively for ExAIS_SMS datasets. In conclusion, our proposed BiLSTM model showed significant improvement over traditional ML classifiers. To further validate the robustness of our model, we applied the UCI datasets, and our results showed optimal performance while classifying SMS spam messages based on some metrics: accuracy, precision, recall, and F-measure.publishedVersio

    Study of the Yahoo-Yahoo Hash-Tag Tweets Using Sentiment Analysis and Opinion Mining Algorithms

    Get PDF
    Mining opinion on social media microblogs presents opportunities to extract meaningful insight from the public from trending issues like the “yahoo-yahoo” which in Nigeria, is synonymous to cybercrime. In this study, content analysis of selected historical tweets from “yahoo-yahoo” hash-tag was conducted for sentiment and topic modelling. A corpus of 5500 tweets was obtained and pre-processed using a pre-trained tweet tokenizer while Valence Aware Dictionary for Sentiment Reasoning (VADER), Liu Hu method, Latent Dirichlet Allocation (LDA), Latent Semantic Indexing (LSI) and Multidimensional Scaling (MDS) graphs were used for sentiment analysis, topic modelling and topic visualization. Results showed the corpus had 173 unique tweet clusters, 5327 duplicates tweets and a frequency of 9555 for “yahoo”. Further validation using the mean sentiment scores of ten volunteers returned R and R2 of 0.8038 and 0.6402; 0.5994 and 0.3463; 0.5999 and 0.3586 for Human and VADER; Human and Liu Hu; Liu Hu and VADER sentiment scores, respectively. While VADER outperforms Liu Hu in sentiment analysis, LDA and LSI returned similar results in the topic modelling. The study confirms VADER’s performance on unstructured social media data containing non-English slangs, conjunctions, emoticons, etc. and proved that emojis are more representative of sentiments in tweets than the texts.publishedVersio

    Malignant skin melanoma detection using image augmentation by oversampling in nonlinear lower-dimensional embedding manifold

    Get PDF
    The continuous rise in skin cancer cases, especially in malignant melanoma, has resulted in a high mortality rate of the affected patients due to late detection. Some challenges affecting the success of skin cancer detection include small datasets or data scarcity problem, noisy data, imbalanced data, inconsistency in image sizes and resolutions, unavailability of data, reliability of labeled data (ground truth), and imbalance of skin cancer datasets. This study presents a novel data augmentation technique based on covariant Synthetic Minority Oversampling Technique (SMOTE) to address the data scarcity and class imbalance problem. We propose an improved data augmentation model for effective detection of melanoma skin cancer. Our method is based on data oversampling in a nonlinear lower-dimensional embedding manifold for creating synthetic melanoma images. The proposed data augmentation technique is used to generate a new skin melanoma dataset using dermoscopic images from the publicly available P H2 dataset. The augmented images were used to train the SqueezeNet deep learning model. The experimental results in binary classification scenario show a significant improvement in detection of melanoma with respect to accuracy (92.18%), sensitivity (80.77%), specificity (95.1%), and F1-score (80.84%). We also improved the multiclass classification results in melanoma detection to 89.2% (sensitivity), 96.2% (specificity) for atypical nevus detection, 65.4% (sensitivity), 72.2% (specificity), and for common nevus detection 66% (sensitivity), 77.2% (specificity). The proposed classification framework outperforms some of the state-of-the-art methods in detecting skin melanoma.publishedVersio

    Cloud Multi-Tenancy: Issues and Developments

    Get PDF
    Cloud Computing (CC) is a computational paradigm that provides pay-per use services to customers from a pool of networked computing resources that are provided on demand. Customers therefore does not need to worry about infrastructure or storage. Cloud Service Providers (CSP) make custom built applications available to customers online. Also, organisations and enterprises can build and deploy applications based on platforms provided by the Cloud service provider. Scalable storage and computing resources is also made available to consumers on the Clouds at a cost. Cloud Computing takes virtualization a step further through the use of virtual machines, it allows several customers share the same physical machine. In addition, it is possible for numerous customers to share applications provided by a CSP; this sharing model is known as multi-tenancy. Though Multi-tenancy has its drawbacks but however, it is highly desirable based on its cost efficiency. This paper presents the comprehensive study of existing literatures on relevant issues and development relating to cloud multitenancy using reliable methods. This study examines recent trends in the area of cloud multi-tenancy and provides a guide for future research. The analyses of this comprehensive study was based on the following questions relating to recent study in multi-tenancy which are: what is the current trend and development in cloud multi-tenancy? Existing publications were analyzed in this area including journals, conferences, white papers and publications in reputable magazines. The expected result at the end of this review is the identification of trends in cloud multi-tenancy. This will be of benefit to prospective cloud users and even cloud providers

    An Ensemble Learning Model for COVID-19 Detection from Blood Test Samples

    Get PDF
    Current research endeavors in the application of artificial intelligence (AI) methods in the diagnosis of the COVID-19 disease has proven indispensable with very promising results. Despite these promising results, there are still limitations in real-time detection of COVID-19 using reverse transcription polymerase chain reaction (RT-PCR) test data, such as limited datasets, imbalance classes, a high misclassification rate of models, and the need for specialized research in identifying the best features and thus improving prediction rates. This study aims to investigate and apply the ensemble learning approach to develop prediction models for effective detection of COVID-19 using routine laboratory blood test results. Hence, an ensemble machine learning-based COVID-19 detection system is presented, aiming to aid clinicians to diagnose this virus effectively. The experiment was conducted using custom convolutional neural network (CNN) models as a first-stage classifier and 15 supervised machine learning algorithms as a second-stage classifier: K-Nearest Neighbors, Support Vector Machine (Linear and RBF), Naive Bayes, Decision Tree, Random Forest, MultiLayer Perceptron, AdaBoost, ExtraTrees, Logistic Regression, Linear and Quadratic Discriminant Analysis (LDA/QDA), Passive, Ridge, and Stochastic Gradient Descent Classifier. Our findings show that an ensemble learning model based on DNN and ExtraTrees achieved a mean accuracy of 99.28% and area under curve (AUC) of 99.4%, while AdaBoost gave a mean accuracy of 99.28% and AUC of 98.8% on the San Raffaele Hospital dataset, respectively. The comparison of the proposed COVID-19 detection approach with other state-of-the-art approaches using the same dataset shows that the proposed method outperforms several other COVID-19 diagnostics methods.publishedVersio

    IMPLEMENTATION OF A BIMODAL BIOMETRIC ACCESS CONTROL SYSTEM FOR DATA CENTER

    Get PDF
    The use of biometrics has become one of the only sure ways to provide secure access control to rooms where vital asset are stored, such as data centers where valuable information are stored. This paper aim at designing and implementing a bimodal biometric access control system for data center using fingerprint and Iris trait of the same person, it is called bimodal biometric system. The system was implemented by integrating hardware components such as PIC18F452 microcontroller, fingerprint and iris sensors and so no with the software programs as such C language and MYSQL interface. On testing, it is found to improve the security and reliability in the access control systems management of the data cente

    An improved random bit-stuffing technique with a modified RSA algorithm for resisting attacks in information security (RBMRSA)

    Get PDF
    The recent innovations in network application and the internet have made data and network security the major role in data communication system development. Cryptography is one of the outstanding and powerful tools for ensuring data and network security. In cryptography, randomization of encrypted data increases the security level as well as the Computational Complexity of cryptographic algorithms involved. This research study provides encryption algorithms that bring confidentiality and integrity based on two algorithms. The encryption algorithms include a well-known RSA algorithm (1024 key length) with an enhanced bit insertion algorithm to enhance the security of RSA against different attacks. The security classical RSA has depreciated irrespective of the size of the key length due to the development in computing technology and hacking system. Due to these lapses, we have tried to improve on the contribution of the paper by enhancing the security of RSA against different attacks and also increasing diffusion degree without increasing the key length. The security analysis of the study was compared with classical RSA of 1024 key length using mathematical evaluation proofs, the experimental results generated were compared with classical RSA of 1024 key length using avalanche effect in (%) and computational complexity as performance evaluation metrics. The results show that RBMRSA is better than classical RSA in terms of security but at the cost of execution time.publishedVersio

    A BIMODAL BIOMETRIC BANK VAULT ACCESS CONTROL SYSTEM

    Get PDF
    The bank vault system has security as its most important aim. Banks could go bankrupt if the vault’s security system becomes compromised. In this paper, the use of bimodal biometrics (fingerprint and iris) is proposed as a means of ensuring the full integrity of the bank’s vault system, thus, further reducing the rate of compromise and theft within the bank’s vault system. A scanner captures the fingerprint and the iris of authorized users. The images of the fingerprint and iris captured by the scanner are segmented, normalized and made into templates that are stored in a database along with the particulars of the users. The accuracy of the system is measured in terms of sample acquisition error and recognition performance using False Accept Rate (FAR), False Identification Rate (FIR) and False Reject Rate (FRR). The result shows that the proposed system is very effective

    IMPLEMENTATION OF A BIMODAL BIOMETRIC ACCESS CONTROL SYSTEM FOR DATA CENTER

    Get PDF
    The use of biometrics has become one of the only sure ways to provide secure access control to rooms where vital asset are stored, such as data centers where valuable information are stored. This paper aim at designing and implementing a bimodal biometric access control system for data center using fingerprint and Iris trait of the same person, it is called bimodal biometric system. The system was implemented by integrating hardware components such as PIC18F452 microcontroller, fingerprint and iris sensors and so no with the software programs as such C language and MYSQL interface. On testing, it is found to improve the security and reliability in the access control systems management of the data center

    BiLSTM with data augmentation using interpolation methods to improve early detection of Parkinson disease

    No full text
    Serija: Annals of computer science and information systems, vol. 21The lack of dopamine in the human brain is the cause of Parkinson disease (PD) which is a degenerative disorder common globally to older citizens. However, late detection of this disease before the first clinical diagnosis has led to increased mortality rate. Research effort towards the early detection of PD has encountered challenges such as: small dataset size, class imbalance, overfitting, high false detection rate, model complexity, etc. This paper aims to improve early detection of PD using machine learning through data augmentation for very small datasets. We propose using Spline interpolation and Piecewise Cubic Hermite Interpolating Polynomial (Pchip) interpolation methods to generate synthetic data instances. We further investigate on reducing dimensionality of features for effective and real-time classification while considering computational complexity of implementation on real-life mobile phones. For classification we use Bidirectional LSTM (BiLSTM) deep learning network and compare the results with traditional machine learning algorithms like Support Vector Machine (SVM), Decision Tree, Logistic regression, KNN and Ensemble bagged tree. For experimental validation we use the Oxford Parkinson disease dataset with 195 data samples, which we have augmented with 571 synthetic data samples. The results for BiLSTM shows that even with a holdout of 90%, the model was still able to effectively recognize PD with an average accuracy for ten rounds experiment using 22 features as 82.86%, 97.1%, and 96.37% for original, augmented (Spline) and augmented (Pchip) datasets, respectively. Our results show that proposed data augmentation schemes have significantly (p < 0.001) improved the accuracy of PD recognition on a small dataset using both classical machine learning models and BiLSTMInformatikos fakultetasKauno technologijos universitetasVytauto Didžiojo universiteta
    corecore